Guided Project: Analyzing movie reviews

Posted on Wed 08 July 2015 in Projects

In [1]:
import pandas

movies = pandas.read_csv("fandango_score_comparison.csv")
In [2]:
movies
Out[2]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.30 ... 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.00 ... 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.50 ... 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 ... 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.40 ... 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5
5 The Water Diviner (2015) 63 62 50 6.8 7.2 4.5 4.0 3.15 3.10 ... 3.60 3.0 3.0 2.5 3.5 3.5 34 39373 397 0.5
6 Irrational Man (2015) 42 53 53 7.6 6.9 4.0 3.5 2.10 2.65 ... 3.45 2.0 2.5 2.5 4.0 3.5 17 2680 252 0.5
7 Top Five (2014) 86 64 81 6.8 6.5 4.0 3.5 4.30 3.20 ... 3.25 4.5 3.0 4.0 3.5 3.5 124 16876 3223 0.5
8 Shaun the Sheep Movie (2015) 99 82 81 8.8 7.4 4.5 4.0 4.95 4.10 ... 3.70 5.0 4.0 4.0 4.5 3.5 62 12227 896 0.5
9 Love & Mercy (2015) 89 87 80 8.5 7.8 4.5 4.0 4.45 4.35 ... 3.90 4.5 4.5 4.0 4.5 4.0 54 5367 864 0.5
10 Far From The Madding Crowd (2015) 84 77 71 7.5 7.2 4.5 4.0 4.20 3.85 ... 3.60 4.0 4.0 3.5 4.0 3.5 35 12129 804 0.5
11 Black Sea (2015) 82 60 62 6.6 6.4 4.0 3.5 4.10 3.00 ... 3.20 4.0 3.0 3.0 3.5 3.0 37 16547 218 0.5
12 Leviathan (2014) 99 79 92 7.2 7.7 4.0 3.5 4.95 3.95 ... 3.85 5.0 4.0 4.5 3.5 4.0 145 22521 64 0.5
13 Unbroken (2014) 51 70 59 6.5 7.2 4.5 4.1 2.55 3.50 ... 3.60 2.5 3.5 3.0 3.5 3.5 218 77518 9443 0.4
14 The Imitation Game (2014) 90 92 73 8.2 8.1 5.0 4.6 4.50 4.60 ... 4.05 4.5 4.5 3.5 4.0 4.0 566 334164 8055 0.4
15 Taken 3 (2015) 9 46 26 4.6 6.1 4.5 4.1 0.45 2.30 ... 3.05 0.5 2.5 1.5 2.5 3.0 240 104235 6757 0.4
16 Ted 2 (2015) 46 58 48 6.5 6.6 4.5 4.1 2.30 2.90 ... 3.30 2.5 3.0 2.5 3.5 3.5 197 49102 6437 0.4
17 Southpaw (2015) 59 80 57 8.2 7.8 5.0 4.6 2.95 4.00 ... 3.90 3.0 4.0 3.0 4.0 4.0 128 23561 5597 0.4
18 Night at the Museum: Secret of the Tomb (2014) 50 58 47 5.8 6.3 4.5 4.1 2.50 2.90 ... 3.15 2.5 3.0 2.5 3.0 3.0 103 50291 5445 0.4
19 Pixels (2015) 17 54 27 5.3 5.6 4.5 4.1 0.85 2.70 ... 2.80 1.0 2.5 1.5 2.5 3.0 246 19521 3886 0.4
20 McFarland, USA (2015) 79 89 60 7.2 7.5 5.0 4.6 3.95 4.45 ... 3.75 4.0 4.5 3.0 3.5 4.0 59 13769 3364 0.4
21 Insidious: Chapter 3 (2015) 59 56 52 6.9 6.3 4.5 4.1 2.95 2.80 ... 3.15 3.0 3.0 2.5 3.5 3.0 115 25134 3276 0.4
22 The Man From U.N.C.L.E. (2015) 68 80 55 7.9 7.6 4.5 4.1 3.40 4.00 ... 3.80 3.5 4.0 3.0 4.0 4.0 144 22104 2686 0.4
23 Run All Night (2015) 60 59 59 7.3 6.6 4.5 4.1 3.00 2.95 ... 3.30 3.0 3.0 3.0 3.5 3.5 141 50438 2066 0.4
24 Trainwreck (2015) 85 74 75 6.0 6.7 4.5 4.1 4.25 3.70 ... 3.35 4.5 3.5 4.0 3.0 3.5 169 27380 8381 0.4
25 Selma (2014) 99 86 89 7.1 7.5 5.0 4.6 4.95 4.30 ... 3.75 5.0 4.5 4.5 3.5 4.0 316 45344 7025 0.4
26 Ex Machina (2015) 92 86 78 7.9 7.7 4.5 4.1 4.60 4.30 ... 3.85 4.5 4.5 4.0 4.0 4.0 672 154499 3458 0.4
27 Still Alice (2015) 88 85 72 7.8 7.5 4.5 4.1 4.40 4.25 ... 3.75 4.5 4.5 3.5 4.0 4.0 153 57123 1258 0.4
28 Wild Tales (2014) 96 92 77 8.8 8.2 4.5 4.1 4.80 4.60 ... 4.10 5.0 4.5 4.0 4.5 4.0 107 50285 235 0.4
29 The End of the Tour (2015) 92 89 84 7.5 7.9 4.5 4.1 4.60 4.45 ... 3.95 4.5 4.5 4.0 4.0 4.0 19 1320 121 0.4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
116 Clouds of Sils Maria (2015) 89 67 78 7.1 6.8 3.5 3.4 4.45 3.35 ... 3.40 4.5 3.5 4.0 3.5 3.5 36 11392 162 0.1
117 Testament of Youth (2015) 81 79 77 7.9 7.3 4.0 3.9 4.05 3.95 ... 3.65 4.0 4.0 4.0 4.0 3.5 15 5495 127 0.1
118 Infinitely Polar Bear (2015) 80 76 64 7.9 7.2 4.0 3.9 4.00 3.80 ... 3.60 4.0 4.0 3.0 4.0 3.5 8 1062 124 0.1
119 Phoenix (2015) 99 81 91 8.0 7.2 3.5 3.4 4.95 4.05 ... 3.60 5.0 4.0 4.5 4.0 3.5 21 3687 70 0.1
120 The Wolfpack (2015) 84 73 75 7.0 7.1 3.5 3.4 4.20 3.65 ... 3.55 4.0 3.5 4.0 3.5 3.5 8 1488 66 0.1
121 The Stanford Prison Experiment (2015) 84 87 68 8.5 7.1 4.0 3.9 4.20 4.35 ... 3.55 4.0 4.5 3.5 4.5 3.5 6 950 51 0.1
122 Tangerine (2015) 95 86 86 7.3 7.4 4.0 3.9 4.75 4.30 ... 3.70 5.0 4.5 4.5 3.5 3.5 14 696 36 0.1
123 Magic Mike XXL (2015) 62 64 60 5.4 6.3 4.5 4.4 3.10 3.20 ... 3.15 3.0 3.0 3.0 2.5 3.0 52 11937 9363 0.1
124 Home (2015) 45 65 55 7.3 6.7 4.5 4.4 2.25 3.25 ... 3.35 2.5 3.5 3.0 3.5 3.5 177 41158 7705 0.1
125 The Wedding Ringer (2015) 27 66 35 3.3 6.7 4.5 4.4 1.35 3.30 ... 3.35 1.5 3.5 2.0 1.5 3.5 126 37292 6506 0.1
126 Woman in Gold (2015) 52 81 51 7.2 7.4 4.5 4.4 2.60 4.05 ... 3.70 2.5 4.0 2.5 3.5 3.5 72 17957 2435 0.1
127 The Last Five Years (2015) 60 60 60 6.9 6.0 4.5 4.4 3.00 3.00 ... 3.00 3.0 3.0 3.0 3.5 3.0 20 4110 99 0.1
128 Mission: Impossible – Rogue Nation (2015) 92 90 75 8.0 7.8 4.5 4.4 4.60 4.50 ... 3.90 4.5 4.5 4.0 4.0 4.0 362 82579 8357 0.1
129 Amy (2015) 97 91 85 8.8 8.0 4.5 4.4 4.85 4.55 ... 4.00 5.0 4.5 4.5 4.5 4.0 60 5630 729 0.1
130 Jurassic World (2015) 71 81 59 7.0 7.3 4.5 4.5 3.55 4.05 ... 3.65 3.5 4.0 3.0 3.5 3.5 1281 241807 34390 0.0
131 Minions (2015) 54 52 56 5.7 6.7 4.0 4.0 2.70 2.60 ... 3.35 2.5 2.5 3.0 3.0 3.5 204 55895 14998 0.0
132 Max (2015) 35 73 47 5.9 7.0 4.5 4.5 1.75 3.65 ... 3.50 2.0 3.5 2.5 3.0 3.5 15 5444 3412 0.0
133 Paul Blart: Mall Cop 2 (2015) 5 36 13 2.4 4.3 3.5 3.5 0.25 1.80 ... 2.15 0.5 2.0 0.5 1.0 2.0 211 15004 3054 0.0
134 The Longest Ride (2015) 31 73 33 4.8 7.2 4.5 4.5 1.55 3.65 ... 3.60 1.5 3.5 1.5 2.5 3.5 49 25214 2603 0.0
135 The Lazarus Effect (2015) 14 23 31 4.9 5.2 3.0 3.0 0.70 1.15 ... 2.60 0.5 1.0 1.5 2.5 2.5 62 17691 1651 0.0
136 The Woman In Black 2 Angel of Death (2015) 22 25 42 4.4 4.9 3.0 3.0 1.10 1.25 ... 2.45 1.0 1.5 2.0 2.0 2.5 55 14873 1333 0.0
137 Danny Collins (2015) 77 75 58 7.1 7.1 4.0 4.0 3.85 3.75 ... 3.55 4.0 4.0 3.0 3.5 3.5 33 11206 531 0.0
138 Spare Parts (2015) 52 83 50 7.1 7.2 4.5 4.5 2.60 4.15 ... 3.60 2.5 4.0 2.5 3.5 3.5 7 47377 450 0.0
139 Serena (2015) 18 25 36 5.3 5.4 3.0 3.0 0.90 1.25 ... 2.70 1.0 1.5 2.0 2.5 2.5 19 12165 50 0.0
140 Inside Out (2015) 98 90 94 8.9 8.6 4.5 4.5 4.90 4.50 ... 4.30 5.0 4.5 4.5 4.5 4.5 807 96252 15749 0.0
141 Mr. Holmes (2015) 87 78 67 7.9 7.4 4.0 4.0 4.35 3.90 ... 3.70 4.5 4.0 3.5 4.0 3.5 33 7367 1348 0.0
142 '71 (2015) 97 82 83 7.5 7.2 3.5 3.5 4.85 4.10 ... 3.60 5.0 4.0 4.0 4.0 3.5 60 24116 192 0.0
143 Two Days, One Night (2014) 97 78 89 8.8 7.4 3.5 3.5 4.85 3.90 ... 3.70 5.0 4.0 4.5 4.5 3.5 123 24345 118 0.0
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.00 4.05 ... 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0
145 Kumiko, The Treasure Hunter (2015) 87 63 68 6.4 6.7 3.5 3.5 4.35 3.15 ... 3.35 4.5 3.0 3.5 3.0 3.5 19 5289 41 0.0

146 rows × 22 columns

In [3]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.hist(movies["Fandango_Stars"])
Out[3]:
(array([ 12.,   0.,  27.,   0.,   0.,  41.,   0.,  55.,   0.,  11.]),
 array([ 3. ,  3.2,  3.4,  3.6,  3.8,  4. ,  4.2,  4.4,  4.6,  4.8,  5. ]),
 <a list of 10 Patch objects>)
In [4]:
plt.hist(movies["Metacritic_norm_round"])
Out[4]:
(array([  1.,   2.,  20.,  14.,   0.,  22.,  27.,  20.,  25.,  15.]),
 array([ 0.5,  0.9,  1.3,  1.7,  2.1,  2.5,  2.9,  3.3,  3.7,  4.1,  4.5]),
 <a list of 10 Patch objects>)

Fandango vs Metacritic Scores

There are no scores below a 3.0 in the Fandango reviews. The Fandango reviews also tend to center around 4.5 and 4.0, whereas the Metacritic reviews seem to center around 3.0 and 3.5.

In [5]:
import numpy

f_mean = movies["Fandango_Stars"].mean()
m_mean = movies["Metacritic_norm_round"].mean()
f_std = movies["Fandango_Stars"].std()
m_std = movies["Metacritic_norm_round"].std()
f_median = movies["Fandango_Stars"].median()
m_median = movies["Metacritic_norm_round"].median()

print(f_mean)
print(m_mean)
print(f_std)
print(m_std)
print(f_median)
print(m_median)
4.08904109589
2.97260273973
0.540385977979
0.990960561374
4.0
3.0

Fandango vs Metacritic Methodology

Fandango appears to inflate ratings and isn't transparent about how it calculates and aggregates ratings. Metacritic publishes each individual critic rating, and is transparent about how they aggregate them to get a final rating.

Fandango vs Metacritic number differences

The median metacritic score appears higher than the mean metacritic score because a few very low reviews "drag down" the median. The median fandango score is lower than the mean fandango score because a few very high ratings "drag up" the mean.

Fandango ratings appear clustered between 3 and 5, and have a much narrower random than Metacritic reviews, which go from 0 to 5.

Fandango ratings in general appear to be higher than metacritic ratings.

These may be due to movie studio influence on Fandango ratings, and the fact that Fandango calculates its ratings in a hidden way.

In [6]:
plt.scatter(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
Out[6]:
<matplotlib.collections.PathCollection at 0x109aa4390>
In [7]:
movies["fm_diff"] = numpy.abs(movies["Metacritic_norm_round"] - movies["Fandango_Stars"])
In [8]:
movies.sort_values(by="fm_diff", ascending=False).head(5)
Out[8]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference fm_diff
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.20 ... 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5 4.0
85 Little Boy (2015) 20 81 30 5.9 7.4 4.5 4.3 1.00 4.05 ... 1.0 4.0 1.5 3.0 3.5 38 5927 811 0.2 3.0
47 Annie (2014) 27 61 33 4.8 5.2 4.5 4.2 1.35 3.05 ... 1.5 3.0 1.5 2.5 2.5 108 19222 6835 0.3 3.0
19 Pixels (2015) 17 54 27 5.3 5.6 4.5 4.1 0.85 2.70 ... 1.0 2.5 1.5 2.5 3.0 246 19521 3886 0.4 3.0
134 The Longest Ride (2015) 31 73 33 4.8 7.2 4.5 4.5 1.55 3.65 ... 1.5 3.5 1.5 2.5 3.5 49 25214 2603 0.0 3.0

5 rows × 23 columns

In [9]:
from scipy.stats import pearsonr

r_value, p_value = pearsonr(movies["Fandango_Stars"], movies["Metacritic_norm_round"])

r_value
Out[9]:
0.17844919073895918

Fandango and Metacritic correlation

The low correlation between Fandango and Metacritic scores indicates that Fandango scores aren't just inflated, they are fundamentally different. For whatever reason, it appears like Fandango both inflates scores overall, and inflates scores differently depending on the movie.

In [10]:
from scipy.stats import linregress

slope, intercept, r_value, p_value, stderr_slope = linregress(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
In [11]:
pred = 3 * slope + intercept

pred
Out[11]:
4.0917071528212032

Finding Residuals

In [12]:
pred_1 = 1 * slope + intercept
pred_5 = 5 * slope + intercept
plt.scatter(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
plt.plot([1,5],[pred_1,pred_5])
plt.xlim(1,5)
plt.show()
Out[12]:
<matplotlib.collections.PathCollection at 0x10b35db00>
Out[12]:
[<matplotlib.lines.Line2D at 0x10b316780>]
Out[12]:
(1, 5)